[WIP] mHC: Manifold-constrained Hyper Connection #1859

anhminhnguyenhoang · 2026-01-16T15:15:28Z

Co-authors: @waqahmed-amd-fi @anhminhnguyenhoang

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

…ection mapping

… kernel (#1877) * Refactor mHC kernel and wrapper to implement equations 14-18 with fused kernel * improve comments * Enhance documentation for mhc function: clarify equations, input/output shapes, and activation details * Enhance documentation in test_mhc.py: clarify equations, input/output shapes, and activation details for mHC kernel tests * Add _sinkhorn_knopp_log_domain_kernel to the fusion module * Add logging and sync Sinkhorn-Knopp function for doubly stochastic matrices * sync log-domain Sinkhorn-Knopp kernel for doubly stochastic matrix projection * Improve logging in mhc function to include all alpha parameters

github-actions · 2026-01-20T13:04:12Z

aiter/ops/triton/fusions/mhc.py

+    else:
+        assert out.shape == (M, N), f"Output shape mismatch: expected ({M}, {N}), got {out.shape}"
+        assert out.dtype == x.dtype, f"Output dtype mismatch: expected {x.dtype}, got {out.dtype}"
+        assert out.device == x.device, f"Output device mismatch"


⚠️ [ruff] <F541> _{reported by reviewdog 🐶}
f-string without any placeholders

Suggested change

assert out.device == x.device, f"Output device mismatch"

assert out.device == x.device, "Output device mismatch"

…plified comments for sinkhorn-knopp impl

github-actions · 2026-01-20T15:09:09Z

op_tests/triton_tests/fusions/test_mhc.py

+
+    # Res-stream: no constraints (identity activation)
+    # Just verify it exists
+    assert out_res.shape == (M, n_squared), f"Res-stream shape mismatch"


⚠️ [ruff] <F541> _{reported by reviewdog 🐶}
f-string without any placeholders

Suggested change

assert out_res.shape == (M, n_squared), f"Res-stream shape mismatch"

assert out_res.shape == (M, n_squared), "Res-stream shape mismatch"

github-actions · 2026-01-20T15:09:09Z

op_tests/triton_tests/utils/__init__.py

 # SPDX-License-Identifier: MIT
 # Copyright (C) 2024-2025, Advanced Micro Devices, Inc. All rights reserved.

+from .mhc_ref import *


⚠️ [ruff] <F403> _{reported by reviewdog 🐶}
from .mhc_ref import * used; unable to detect undefined names

github-actions · 2026-01-20T15:09:09Z

op_tests/triton_tests/utils/__init__.py

 # Copyright (C) 2024-2025, Advanced Micro Devices, Inc. All rights reserved.

+from .mhc_ref import *
 from .mla_decode_ref import *


⚠️ [ruff] <F403> _{reported by reviewdog 🐶}
from .mla_decode_ref import * used; unable to detect undefined names

github-actions · 2026-01-20T15:09:09Z

op_tests/triton_tests/utils/__init__.py


+from .mhc_ref import *
 from .mla_decode_ref import *
 from .mla_extend_ref import *


⚠️ [ruff] <F403> _{reported by reviewdog 🐶}
from .mla_extend_ref import * used; unable to detect undefined names

github-actions · 2026-01-20T15:09:09Z

op_tests/triton_tests/utils/__init__.py

+from .mhc_ref import *
 from .mla_decode_ref import *
 from .mla_extend_ref import *
 from .rotary_embedding import *


⚠️ [ruff] <F403> _{reported by reviewdog 🐶}
from .rotary_embedding import * used; unable to detect undefined names

github-actions · 2026-01-20T15:09:09Z

op_tests/triton_tests/utils/mhc_ref.py

+        - H^res: [2n:2n+n²] residual connection (identity) (n² elements)
+    """
+    x_f32 = x.to(torch.float32)
+    nC = x.shape[1]


⚠️ [ruff] <F841> _{reported by reviewdog 🐶}
Local variable nC is assigned to but never used

Suggested change

nC = x.shape[1]

x.shape[1]

github-actions · 2026-01-20T15:09:09Z

op_tests/triton_tests/utils/mhc_ref.py

+    H_tilde = x_norm @ phi_f32
+
+    # Split into three streams
+    n_squared = n * n


⚠️ [ruff] <F841> _{reported by reviewdog 🐶}
Local variable n_squared is assigned to but never used

Suggested change

n_squared = n * n

n * n

* Refactor mHC kernel and wrapper to implement equations 14-18 with fused kernel * improve comments * Enhance documentation for mhc function: clarify equations, input/output shapes, and activation details * Enhance documentation in test_mhc.py: clarify equations, input/output shapes, and activation details for mHC kernel tests * Add _sinkhorn_knopp_log_domain_kernel to the fusion module * Add logging and sync Sinkhorn-Knopp function for doubly stochastic matrices * sync log-domain Sinkhorn-Knopp kernel for doubly stochastic matrix projection * Improve logging in mhc function to include all alpha parameters * Fix H dimensions * Refactor mHC function to return separate output tensors for pre, post, and residual streams * Refactor mhc_torch to return separate output tensors for pre, post, and residual streams * Adjust tolerance for is_doubly_stochastic assertion in test_sk_matrix_sizes for bfloat16 precision

…ke H_res doubly stochastic

anhminhnguyenhoang

Looks good, I would personally clean up the comments as they look a bit redundant

anhminhnguyenhoang · 2026-01-21T09:45:49Z

op_tests/triton_tests/fusions/test_mhc.py

        H_res_torch.to(torch.float32),
-        atol=1e-2,
-        rtol=1e-2,
+        atol=5e-2,


Did you run into test failure because of this for similar tests that you need to relax the tolerance?

Yes, mainly because of sinkhorn which is an iterative process and returns higher differences due you only 10 iterations. May be we can try 20 for better results?

…o pre, post, and residual streams; update tests accordingly.

…ccuracy of assertions.

- Update benchmark script to use dynamic configurations

…lculations for clarity

…ity in stream handling

…parallelize computations along nC dimension with tests included

…r consistency,

…Note that It is not optimized yet.

- 2D grid parallelization: (M/BLOCK_M, n²/BLOCK_SIZE) avoiding sequential M-row loop, processing BLOCK_M batches simultaneously per thread block - Vectorized softmax over n! permutations for BLOCK_M batches - Outer product accumulation for weighted sum computation - Config optimization: BLOCK_M=8, BLOCK_SIZE=16 (gfx942/gfx950)

…ve sinkhorn matching the paper's method. - Uses W^res ∈ ℝ^(nC×n!) instead of ℝ^(nC×n²) - Computes softmax over n! logits (Equation 4) - Constructs H^res as weighted sum of all n! permutation matrices (Equation 5) - Results in doubly stochastic matrix by construction (no Sinkhorn iterations needed)

add initial implementation of projection mapping

c35c8ac

anhminhnguyenhoang assigned anhminhnguyenhoang and waqahmed-amd-fi Jan 16, 2026

anhminhnguyenhoang requested a review from a team January 16, 2026 15:15

anhminhnguyenhoang marked this pull request as draft January 16, 2026 15:15

anhminhnguyenhoang changed the title ~~mHC: Manifold-constrained Hyper Connection~~ [WIP] mHC: Manifold-constrained Hyper Connection Jan 16, 2026

anhminhnguyenhoang requested review from Chi-Chu319, hellozhuo-amd and juuso-oskari January 16, 2026 15:20

waqahmed-amd-fi and others added 5 commits January 19, 2026 04:57

Refactor mHC kernel and wrapper to include sigmoid activation in proj…

2a8325c

…ection mapping

Add Sinkhorn-Knopp log-domain kernel implementation

f11244d

clean up sinkhorn-knopp tests

87e5839

review invalid test case

152bb21

github-actions bot reviewed Jan 20, 2026

View reviewed changes

waqahmed-amd-fi and others added 3 commits January 20, 2026 08:35

Fix H dims

36e36d6

fix test_mhc_output_range

6156c13

Refactor test cases in mHC and Sinkhorn-Knopp implementations and sim…

80b8d34

…plified comments for sinkhorn-knopp impl

github-actions bot reviewed Jan 20, 2026

View reviewed changes

waqahmed-amd-fi and others added 3 commits January 21, 2026 10:29

optimization to loads x_tile once, reducing memory bandwidth

d952829

Update mHC implementation to apply Sinkhorn-Knopp (Equation 19) to ma…

30741ca

…ke H_res doubly stochastic

anhminhnguyenhoang commented Jan 21, 2026

View reviewed changes

waqahmed-amd-fi and others added 7 commits January 21, 2026 06:08

Refactor mHC implementation to separate projection (phi) matrices int…

686711f

…o pre, post, and residual streams; update tests accordingly.

Enhance mHC fused kernel to implement stream-aware processing

831572b

Refactor mHC implementation

ad198b3

Adjust tolerance levels in mHC tests based on input size to improve a…

05810eb

…ccuracy of assertions.

Add benchmark scripts for mHC kernel performance evaluation

7d333a8

add modes to bench

46d023a

- Add naive configs for fused mHC and Sinkhorn-Knopp kernels

e8f4464

- Update benchmark script to use dynamic configurations

anhminhnguyenhoang and others added 25 commits January 23, 2026 08:33

switch to using exp2/log2 for sinkhorn-knopp for optimization

68a2df8

Sort benchmark configurations by hidden dimension and refine FLOPs ca…

df24377

…lculations for clarity

Refactor Sinkhorn-Knopp kernel to support batch processing

a4a1793

better tuned configs

23609ed

Refactor mHC fused kernel for improved arithmetic operations and clar…

514f3f5

…ity in stream handling

Add split-K support to mHC kernel by new split and reduce kernels to …

10b122a

…parallelize computations along nC dimension with tests included

add better config with split reduce usage

9f6aba3

Apply optim in mhc_fused to split reduce kernels, rename functions fo…

833c47a

…r consistency,

Add json config loading

70490ad

Add tuned JSON configuration files for fused mhc kernels

b01c00a

inittial implementation of zero-iteration Sinkhorn-Knopp (mHC-Lite). …

42c944d

…Note that It is not optimized yet.

optimized zero-iteration Sinkhorn-Knopp (mHC-Lite)

190122e

Removed Unused Projection Code (Wrapper Function)

507d95a

add config loading bug fix due to caching and better tuned configs

c2ccafc

add config loading bug fix due to caching and better tuned configs

3e85b9d

add mhc-lite

5374eb6

revised mHC and mHC-Lite description for clarity

4f4c272

update comments and replace if-else with assert check

57c75ac

revised _mhc_lite_fused_split_kernel kernel

68c76b0

revised _mhc_lite_fused_reduce_kernel

604426d

add mhc-lite bench mode

e52ebdc

integrate mhc-lite into mhc_fused

e70f3dd

update config loading for mode

41b4908

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] mHC: Manifold-constrained Hyper Connection #1859

[WIP] mHC: Manifold-constrained Hyper Connection #1859

Uh oh!

anhminhnguyenhoang commented Jan 16, 2026 •

edited

Loading

Uh oh!

github-actions bot Jan 20, 2026

Uh oh!

github-actions bot Jan 20, 2026

Uh oh!

github-actions bot Jan 20, 2026

Uh oh!

github-actions bot Jan 20, 2026

Uh oh!

github-actions bot Jan 20, 2026

Uh oh!

github-actions bot Jan 20, 2026

Uh oh!

github-actions bot Jan 20, 2026

Uh oh!

github-actions bot Jan 20, 2026

Uh oh!

anhminhnguyenhoang left a comment

Uh oh!

anhminhnguyenhoang Jan 21, 2026

Uh oh!

waqahmed-amd-fi Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	assert out.device == x.device, f"Output device mismatch"
	assert out.device == x.device, "Output device mismatch"

	assert out_res.shape == (M, n_squared), f"Res-stream shape mismatch"
	assert out_res.shape == (M, n_squared), "Res-stream shape mismatch"

[WIP] mHC: Manifold-constrained Hyper Connection #1859

Are you sure you want to change the base?

[WIP] mHC: Manifold-constrained Hyper Connection #1859

Uh oh!

Conversation

anhminhnguyenhoang commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

github-actions bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

anhminhnguyenhoang left a comment

Choose a reason for hiding this comment

Uh oh!

anhminhnguyenhoang Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

waqahmed-amd-fi Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

anhminhnguyenhoang commented Jan 16, 2026 •

edited

Loading